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Abstract: An efficient means to map tree plantations is needed to detect tropical land use 
change and evaluate reforestation projects. To analyze recent tree plantation expansion in 
northeastern Costa Rica, we examined the potential of combining moderate-resolution 
hyperspectral imagery (2005 HyMap mosaic) with multitemporal, multispectral data 
(Landsat) to accurately classify (1) general forest types and (2) tree plantations by species 
composition. Following a linear discriminant analysis to reduce data dimensionality, 
we compared four Random Forest classification models: hyperspectral data (HD) alone; 
HD plus interannual spectral metrics; HD plus a multitemporal forest regrowth classification; 
and all three models combined. The fourth, combined model achieved overall accuracy of 
88.5%. Adding multitemporal data significantly improved classification accuracy (p < 0.0001) 




Remote Sens. 2015 , 7 


5661 


of all forest types, although the effect on tree plantation accuracy was modest. The hyperspectral 
data alone classified six species of tree plantations with 75% to 93% producer’s accuracy; 
adding multitemporal spectral data increased accuracy only for two species with dense 
canopies. Non-native tree species had higher classification accuracy overall and made up the 
majority of tree plantations in this landscape. Our results indicate that combining 
occasionally acquired hyperspectral data with widely available multitemporal satellite 
imagery enhances mapping and monitoring of reforestation in tropical landscapes. 

Keywords: hyperspectral fusion; Landsat; Costa Rica; reforestation; secondary forests; 
payments for environmental services (PES); tree plantations; remote sensing 


1. Introduction 

Tree plantations and tree crops are expanding rapidly across the globe in response to rising demand 
for wood pulp, timber, fruit, and vegetable oil [1,2]. However, the current distribution of tree plantations 
and tree crops is often estimated from government documents reporting planted areas that are of variable 
accuracy and lack consistent monitoring [3-6]. Global and regional maps of tree plantations and tree crops 
using readily available satellite imagery are greatly needed to improve estimates of carbon storage and 
monitor land-cover change, particularly in tropical regions where extensive field inventories are rare [3,5], 

In this study, we assess the potential of an integration of hyperspectral and multitemporal 
multispectral imagery to aid regional reforestation monitoring by distinguishing tropical tree plantations 
from other forest types and identifying tree plantations at the species level. Most region-scale maps of 
forest cover based on multispectral or SAR satellite imagery do not distinguish tree plantations and tree 
crops from other forest types [7,8] and often combine them with secondary forests [6,9-13], This 
consolidation of forest types can oversimplify and mask key ecological differences that are important 
for monitoring forest change, biodiversity, and carbon stocks. Tree plantations and tree crops support 
much lower levels of biodiversity than mature and secondary forests, and potentially have higher rates 
of clearing for timber management and soil carbon emissions [14-16], 

Different tree plantation species can also have distinct impacts on local ecosystem services and 
biodiversity. For example, Eucalyptus spp. and Pinus spp. are prone to burning in dry regions, which 
increases carbon emissions [17] and species within plantations differ in carbon sequestration rates [18]. 
Plantation species with open or structurally complex canopies can promote dense native understory 
regrowth and forest connectivity [19-21]. Conversely, fast-growing tree species with dense shade can 
function as “biological deserts” with very low biological diversity [22], Similarly, crop species like oil 
palm, rubber, and fruit trees are often intensively managed in monoculture plantations [2,23], 

In Costa Rica, despite an intense interest in REDD+ and substantial government investment [24], 
the degree of success of reforestation programs in increasing the cover of tree plantations is not well 
known. Government payments to establish tree plantations during the 1980s led to a large pulse of 
reforestation [9,25-30], and the recent 1996 Forestry Law (no. 7575) further encouraged tree plantation 
establishment through payments for environmental services (PES) [31-35], Northeastern Costa Rica in 
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particular has received extensive reforestation payments, in part because of efforts to increase habitat 
connectivity in the San Juan-La Selva Biological Corridor (SJLSBC; Figure 1) [32]. 
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Figure 1 . Map of the study area in lowland Costa Rica (10.42'N-84.00'E), showing the 
San Juan-La Selva Biological Corridor (red line, 2466 km 2 ), the hyperspectral imagery 
mosaic, and parks and wildlife refuges (shades of grey). La Selva Biological Station is 
centrally located in the imagery (a 4509 km 2 mosaic). 

Previous satellite image-based land-cover maps of the SJLSBC have not successfully distinguished 
tree plantations or individual tree plantation species from other forest types at regional spatial 
extents [9,36-38]. However, high loss rates of reforestation (a land cover category including both 
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secondary forests and tree plantations) have been observed in this region [9,25,29]. As a result, 
the long-term persistence and composition of tree plantations in this landscape remains unclear [9]. 

Hyperspectral imagery may improve our ability to accurately distinguish tropical tree plantations 
from other forest types and to identify species within plantations [39]. Hyperspectral imagery has been 
used to identify tree species occurring within temperate and tropical forests [40-42], Most subtropical 
and tropical studies to date have relied on high-resolution hyperspectral imagery to identify rainforest and 
savanna tree species at the individual tree canopy scale [39,43^47] . Moderate-resolution (10-30 m pixels) 
hyperspectral imagery has been used to successfully distinguish emergent forest canopy trees in Peru [48], 
invasive tree species in Hawaii [49], Indian tree species [50], and mangrove forest species [51]. A small 
number of studies have used high-resolution data to determine tree species within already-delineated 
timber plantations as part of timber inventories [52,53]. 

Despite the potential for identifying forest species with hyperspectral imagery, the degree to which 
tropical forest types and tree plantation species composition can be mapped remains unclear. The relatively 
few studies that have attempted to distinguish tropical secondary forests from mature tropical forests 
using hyperspectral data have had variable accuracy [54-56]. Distinguishing tree plantations from other 
forest types may present an additional challenge. Spectral reflectance of tree plantations can be quite 
variable even within the same tree species, changing with plantation nutrition and disease status [57], 
degree of canopy disturbance [58], underlying soil type [59], and plantation age [52], Tree plantation 
classification accuracy across landscapes is highly variable in the multispectral sensor literature, with 
frequent confusion between plantations and both secondary and mature forests, respectively [9,60-67], 

Some non-native species, such as pine, acacia, and eucalyptus, have been classified with high overall 
accuracy (>80%) using multispectral data [60-62], implying that accurate plantation discrimination may 
be less challenging when tree plantations and tree crops are not comprised of or mixed with native 
species or when spectral contrast with local native vegetation or agriculture is high. In northeastern Costa 
Rica, Fagan et al. [9] were able to classify non-native tree plantations with high accuracy using 
multispectral data, but native tree plantations were typically confused with secondary forests (<65% 
accuracy). Sesnie et al. [37], working in the same region, was able to classify a general tree plantation 
class with -90% accuracy using Landsat and ancillary data and a labor-intensive image-subsetting 
technique; however, class accuracy decreased to 55% when implemented at larger image extents. 
Spectral confusion with native vegetation is a well-known challenge in agroforestry and tree crop 
systems, particularly in mapping shade coffee [68,69], oil palm [70-73], and evergreen rubber tree 
plantations [74,75]. 

In this study we developed a cross-sensor approach to map forest types that combines infrequent 
airborne hyperspectral imagery with multitemporal satellite imagery. Hyperspectral data has frequently 
been fused with LiDAR, and to a lesser extent SAR, [41,47,76,77] to conduct single-date mapping of 
forest types. However, this approach is often constrained by data availability and cost to relatively small 
areas. Incorporating multitemporal, inter-annual data on forest disturbance and spectral change into 
classifications may help to distinguish tree plantations and secondary forests from less disturbed mature 
forests across large areas. Currently, the data source with the longest temporal record and global 
coverage is the freely- available imagery collected by the moderate-resolution multispectral Landsat 
sensors [77]. Landsat data has been effectively used to map and age tropical reforestation [12,78], and 
single-band spectral chronologies have been used to track disturbance to forests and forest recovery over 



Remote Sens. 2015, 7 


5664 


time [79,80]. Although frequent cloud cover can lower the utility of multitemporal approaches by 
decreasing image frequency [81], less frequent image intervals may still contain valuable information 
for distinguishing undisturbed primary forest from reforestation in recently disturbed sites [78,81,82], 

The principle objectives of this study were to: (1) test whether integrating hyperspectral imagery with 
multitemporal Landsat information increases discrimination of tree plantations, secondary, and mature 
forest types, compared to hyperspectral imagery alone; and (2) examine whether this integration also 
improved mapping of tree plantation species composition. We hypothesized that accurate plantation 
discrimination would be more difficult when tree plantations are comprised of native species. We further 
hypothesized that integrating Landsat multitemporal data with hyperspectral data would improve the 
overall single-date classification accuracy of tree plantations in northeastern Costa Rica. 

2. Experimental Section 

2.1. Study Area 

The study area encompasses principal reforestation areas in the lowlands of northeastern Costa Rica, 
covering a total of 4509 km 2 (Figure 1). We selected this region for study because of the extensive 
availability of satellite and airborne remote sensing data, a history of large-scale reforestation efforts, 
and the presence of an established PES program (1996) that subsidizes tree planting and maintenance 
among landowners. Ongoing partnerships between American and Canadian space agencies and the 
Costa Rican government have led to numerous aerial missions in the last decade to collect hyperspectral, 
LiDAR, SAR, and high-resolution multispectral data across much of the country [83,84], The long-term 
rainforest monitoring plots, experimental tree plantations, and secondary forests at La Selva Biological 
Station have been the particular target of remotely sensed data collection. 

The northeastern lowlands are an agricultural frontier whose settlement expanded rapidly in the late 
1960s leading to widespread deforestation [29,30,85,86] that slowed after 2001 [9,25,34], The study area 
has moderate variability in annual rainfall (3.2 + 0.8 m/year) and elevation (108 ± 98 m), with central 
ranges of low hills cut by broad river valleys giving way to coastal plains to the east. Despite loss of 
over half of the area’s forest cover, the Atlantic lowlands retain large patches of old-growth forest outside 
protected areas [9,87], The SJLSBC was established in 1997 to maintain and re-establish forest connectivity 
between Costa Rica’s montane parks and Nicaragua’s lowland Indio Maiz Biological Reserve [88], 

To evaluate the extent of reforestation in this landscape, we mapped the narrow central waist of the 
San Juan-La Selva Biological Corridor and adjacent areas using aerial hyperspectral imagery (Figure 1). 
Adjacent areas outside the SJLSBC were mapped to the northwest and southeast (along flight lines) 
within Costa Rica (hereafter, the “study region”; Figure 1). 

Common tree species planted within this region include Tectona grandis, Gmelina arborea, Vochysia 
guatemalensis, Terminalia amazonia, Hieronyma alchomeoides, and occasionally Dipteryx panamensis, 
Terminalia ivorensis, or Vochysia ferruginea [18]. Private commercial plantations of 
non-native Tectona and Gmelina and citrus fruit orchards ( Citrus spp.) were common within the region 
during the study period. Ecological characteristics of the most commonly planted tree species are 
described in Table 1 ([18], [89-96]). 
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Table 1 . Characteristics of the selected tree plantation species in this study. The first three 
species ( Citrus spp., G. arborea, and T. grandis ) are non-native, while the rest are native tree 
species. Three of the groups (referred to as species for simplicity) were actually genera, with 
species combined because of low sample sizes: Citrus spp. (species unknown), Vochysia spp. 
(, guatemalensis and ferruginea), and Terminalia spp. ( amazonia and ivorensis). 


Species 

Uses [18] 

Timber Harvest 
Rotation [89-93] 

Canopy [89-94] * 

Canopy 
Closure Rate 
[18,95-96] 

Potential 
Understory 
Habitat [94] * 

Management 
Issues [92-93, 96] * 

Citrus spp. 

Fruit 

NA 

Intermediate, with 
row-gaps 

NA: row 
cultivation. 

Grass 

Intensively 
managed and 
cleared 

Gmelina arborea 

Timber 

3-15 years 

Dense 

High to Very 
High 

Short, thin 

Disease, herbicide 
when planting 

Tectona grandis 

Timber 

15-25 years 

Dense to thin, 
semi-deciduous 

Medium 

Tall, dense 

Fire or manual 
clearing of 
understory required. 

Vochysia spp. ( gu . 
and fer.) 

Timber, 

habitat 

15-25+ years 

Thin to intermediate 

Medium to 
Medium-High 

Tall, dense 

N/A 

Hieronyma 

alchorneoides 

Timber, 

habitat 

25^-0+ years 

Thin to intermediate 

Medium 

Tall, dense 

N/A 

Terminalia spp. 
(am. and ivor.) 

Timber, 

habitat 

25-40+ years 

Very thin 

Medium 

Tall, dense 

N/A 


* denotes where literature data are supplemented with personal observations from field work. 


2.2. Remote Sensing Data 

HyMap II hyperspectral data were acquired by the NASA CARTA-II flight campaign conducted over 
northern Costa Rica from 01 March to 25 March, 2005 [83]. The HyMap II whiskbroom sensor 
(manufactured by Hy Vista Corporation, Castle Hill, Australia) included 125 spectral channels covering 
the 458 to 2491 nm range at a 15 nm sampling interval. Water absorption regions centered at 1350 and 
1850 nm were excluded. The instantaneous field of view covered a 61° swath with a 14.2 to 16.7 m 
pixel size [83]. Seven HyMap images with low cloud cover (<20%) covering the central SJLSBC and 
adjacent areas were selected for analysis (Table Al). 

Image pre-processing and processing steps to normalize reflectance values and improve image quality 
for enhancing discrimination of forest types and plantation species are summarized in Figure 2. We used 
the HyMap surface reflectance data products released by Hy Vista, which included georectification from 
onboard DGPS, ATREM model processing to correct for atmospheric effects, and EFFORT polishing 
of reflectance values, described in Arroyo-Mora et al. [97]. We resampled HyMap images to the mean 
pixel size (15 m) using the nearest neighbor method and co-registered image tiles to a 2005 Landsat 7 
pan image {<1.5 mroot mean squared error (RSME); Table Al) using an automated tie point program [98], 
Clouds, cloud shadows and terrain shadows were masked from each image using manual decision trees 
based on NDVI and hyperspectral bands centered on 460, 500, 1170, and 1460 nm. We corrected the 
masked images for cross-track and along-track illumination differences using a bilinear cross-track 
correction algorithm (A. Singh, in prep). Because of differences in illumination among image dates and 
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residual errors from georectification (<7.5 m RSME), we conducted cross-image illumination correction 
using the haze correction relative normalization method [99], selecting the reference normalization 
image that had the lowest blue-band reflectance in shadowed areas (Table Al). 


HyMap and Landsat Image Processing 



Figure 2. Flow diagram of pre-processing and processing steps. Blue diamonds denote 
images, green squares mark image processing, white boxes and circles denote data derived 
from images or field work, grey circles mark randomly selected, independent testing and 
training data, and red circles mark analysis products. 

We mosaicked the resulting final images, replacing cloudy and shadowed pixels with clear pixels in 
overlap areas where possible (Figure 1). Some minor radiometric differences between images persisted 
after correction; we used spectrally stable old-growth forest targets [81] to assess among-image 
differences and compare spectral variation between the HyMap mosaic and Landsat (see below) 
imagery. We estimated that among-image radiometric differences led to an additional 9% variance in 
spectral reflectance, on average, at stable forest targets. We further addressed radiometric variability 
across image tiles by collecting training and validation data across the entire study area (see Section 2.3). 

Multispectral data were acquired by the Landsat 5 and 7 sensors (scene path 15, row 53) in four years 
(1986, 1996, 2001, and 2005) with low annual cloud cover, between the months of October and March 
(Table Al). Each year selected was further composited to remove existing clouds using the next closest 
image dates to the degree possible (Table Al). Each Landsat 5 image was geometrically corrected to 
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Landsat 7 images to less than 0.5 pixel RMSE (<15 m) using an automated tie-point program [98]. We 
corrected images for atmospheric effects using LED APS [81] and radiometrically normalized all images 
to the most haze-free image in the time series (Table Al) using the MAD algorithm [100]. To remove 
clouds and Landsat 7 missing scan line errors, all images except 2001 were composites of two or three 
image dates separated by less than thirteen months. Prior to image mosaicking, clouds and line errors 
were masked using custom decision trees employing Band 1, 3, 4 and 6 thresholds. Further details on 
image processing can be found in Fagan et al. [9] supplemental. 


Class 

Hymap, 794 nm 

Landsat 
Mean Distance 

Forest types 

Reflectance 

Euclidean distance metric 

Mature Forest (MF) 

0.296 

0.007 

Secondary Forest (SF) 

0.340 

0.036 

Tree Plantation (TP) 

0.439 

0.061 


Figure 3. Two images of La Selva Biological Station, with land-use classes labeled at points 
for illustration, and a table showing the spectral values of the points in each image. Image A 
(left) shows a metric derived from Landsat imagery — the mean of the Euclidean distance 
(ED) in Band 5 between all four image pairs in a Landsat image stack (1986-2005). Warmer 
colors signify greater mean change in Band 5 over time; both reforestation and deforestation 
events show high mean change; Image B (right), showing the same area, is a three-band 
HyMap reflectance composite (593:794:534 nm). To illustrate radiometric variation between 
images, the highest spectral contrast between two image tiles in the entire HyMap image 
mosaic can be seen just below the mature forest point, in the bottom left corner of the red box. 

To highlight recently disturbed forest and tree plantations in a single measure for multitemporal 
analyses, we used a single Landsat band (band 5; 1550-1750 nm) for each pixel across the image time 
series (Figure 3). Landsat band 5 by itself has been used to detect forest disturbance in previous 


Remote Sens. 2015 , 7 


5668 


studies [101,102], and in our preliminary analysis (M.E. Fagan, unpublished data), it performed as well 
or better than other single-band metrics like Normalized Burn Ratio and Tasseled Cap bands [79,80]. 
For each Landsat image pair we calculated the Euclidean spectral distance in band 5, across all possible 
image pairs; masked cloud and scan-line error pixels were omitted from the calculation. We used the 
mean and variance of spectral distance to summarize the magnitude and variability of spectral change 
for each pixel. For example, pixels with a larger mean distance and high variability represent forest areas 
that likely experienced disturbance, such as tree plantations with rapid harvest rotations. 

To further characterize forest disturbance and reforestation, we classified the Fandsat image time 
series. Prior to classification, we selected six spectral bands (Fandsat bands 1-5 and 7) and calculated 
four vegetation indices from the Fandsat data. The vegetation indices were NDYI [103], the Brightness 
and Green bands from the Tasseled Cap transformation [104], and a normalized difference of Band 2 
and Band 5 (ND25, calculated like NDYI). These ten bands, along with elevation derived from a 
hole-filled SRTM DEM (v. 2.1) at 90- meter resolution [105,106], were the inputs into the classification. 
We used a Random Forest supervised classifier (described below) and field-collected training and 
validation data (for full details, see Fagan et al. [9]; Figure Al). Landsat images were first classified to 
forest and nonforest; these forest masks were integrated over time to distinguish stable, mature forest (forest 
persisting from 1986 to 2005) from reforestation and open land cover. Independent testing data indicated 
high within-year classification accuracy of the forest masks (e.g., 96% overall accuracy for 2005) and among- 
year forest-change accuracy for the three land cover types (88% overall; see Fagan et al. [9] for details). 
Because distinguishing reforestation from other forest types was our primary goal, we converted the Landsat- 
derived 2005 land cover map into a mask that indicated the presence or absence of reforestation. 

To permit the inclusion of the 30 m Landsat data in the hyperspectral classification, final summary 
metrics and land cover variables were resampled to a 15 m pixel size and then matched to the 
geo-registered HyMap image using a nearest neighbor resampling method. 

2.3. Field Data Collection 

Georeferenced land use data came from three sources: field observations, surveyed boundaries of 
managed forests (polygons), and visual interpretation of georeferenced high-resolution aerial photos. 
We collected 1859 land use data points in all for classification model training and field validation. 
Field observations consisted of a total of 1575 land use points collected using a handheld global 
positioning system (GPS) device during separate field seasons in 2004 and 2005 (Trimble GeoXT; [25]) 
and 2009 to 2012 (Garmin 60CSx; [9]). The 2009 to 2012 field data was backdated to 2005 by comparing 
it with high-resolution aerial photos collected during the CARTA-II mission (0.5 m pixels). An effort 
was made to locate field points at intervals greater than 250 m along paved and unpaved roads across 
the region. Each land use point was located a minimum of 60 m from a road. Information recorded at 
each point included the dominant land use within a 60-m radius area, tree species, and additional 
information on land use history such as forest age. Where the exact age of secondary forests was 
unknown, visual assessment of forest cover in two types of imagery was used to determine the 
approximate age of secondary forest points: the 1986 and 1996 Landsat imagery, and black and white 
aerial photos taken in 1986 and 1992 (1:60,000 scale; [30]). 
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Tree plantation boundaries were georeferenced by a local forest extension organization, FUNDECOR. 
Boundary polygons were collected using Garmin 60CSx GPS units and in many cases were refined 
through repeated visits. Field data from these polygons were generated by randomly locating target 
points at a distance from the polygon edge (>30 m inside the polygon) and then checking the resulting 
points ( n = 64 points) in the high-resolution CARTA-II imagery. CARTA-II aerial photos were also used 
to directly create 220 land use points. Image interpreted points were selected a minimum distance of 
500 m from existing land use points to increase the number of training and validation samples in 
under-surveyed image tiles within the hyperspectral image mosaic. Each point selected from aerial photos 
was surrounded by at least a 60 m radius homogenous area. 

To develop training data for image classification, we randomly selected 70% of the land use data 
points in each class. Each training point was buffered by a 60 m radius polygon known to be a single 
land use type from field and aerial photo inspection, and all pixels that fell within the 60 m buffer were 
assigned to that land use class as training pixels. Overlapping land use polygons were winnowed by 
random selection; all resulting polygons were separated by >150 m. This resulted in a total of 1287 points 
and 33,182 pixels for training, representing 0.2% of the hyperspectral image. No single land use class 
was trained with less than 250 pixels and the median was 1028 pixels (Table 2). 

2.4. Forest Type Classification 

2.4.1. Land Use Classes 

We classified three main forest types, tree plantations, mature forests, and secondary forests, to meet 
our first objective of testing the ability of hyperspectral and multitemporal data to distinguish tree 
plantations from other forest types. Mature forests were defined as lowland rainforests or swamp 
forests older than available Landsat time series data ( i.e ., >24 years in age; (Table 2)). This category 
potentially includes a wide range of forest compositions, as the majority of mature forest constituted 
selectively-logged old-growth lowland rainforest (S.E. Sesnie, pers. comm.). Secondary forests were 
defined as completely-cleared area of forest regrowth <24 years in age (the limit of our h i storical 
records). Tree plantations consisted of closed canopy monocultures made up of one of six tree plantation 
species (Table 2). To improve discrimination of these key forest types and improve monitoring of land 
use change, we also distinguished 1 1 common non-forest land uses, for a total of 20 land use classes 
initially (Table 2). However, for the first set of analyses distinguishing forest types, land use classes were 
grouped post-classification into the three forest categories defined above and a non-forest class, 
categorized as “other” (Table 2). 

To meet our second objective and examine how well hyperspectral and Landsat data distinguish tree 
plantation species composition, subsequent analyses focused on classification accuracy for individual 
tree species in areas classified as tree plantations. Thus, we reported the class accuracy of tree plantations 
both as a single land cover class and by species composition. 
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Table 2. Class information and descriptions for the land-use classes in this study. The classification class was used for the initial model classification 
and map prediction, and the summary class was used for comparisons of the Random Forest models. Tree plantations were assessed both as a 
grouped class and as separated class (i.e., each species) in the final summary accuracy assessment, and so are marked in italics. Classes referred 
to as reforestation in the text are in bold, and include secondary forests and tree plantations. 


Summary Class 

Classification Class 

Short Name 

Descriptions 

Training 

Points 

Training 

Pixels 

Testing 

Points 

Other 

Banana 

Banana 

Large, export-oriented monocultures of banana 

54 

1507 

91 


Heart-of-Palm 

Hpalm 

Monocultures of heart-of-palm; occasional shade trees 

44 

1220 

81 


Pineapple 

Pina 

Large, export-oriented monocultures of pineapple 

47 

1302 

86 


Cassava 

Cass. 

Open monocultures of cassava 

15 

417 

68 


Bare soil 

Soil 

Reddish exposed soil; mix of inceptisols and andisols 

28 

772 

112 


Sand 

Sand 

Sandy soils, adjacent to river. 

23 

909 

74 


Clouds 

Cloud 

Cumulus clouds. 

104 

6342 

67 


Pasture 

Past. 

Open to wooded grassy pasture. 

180 

4990 

152 


Shade 

Shade 

Cloud shadows and dark, deep water. 

100 

4055 

51 


Urban 

Urban 

Mainly cement, asphault, and tin roofs. 

13 

372 

53 


Water 

Water 

Open water. 

67 

1212 

99 

Mature Forest 

Lowland Mature Forest 

Matfor 

Forest >24 years old (Fagan et al. 2013). 

153 

4259 

119 

Swamp Forest 

Swfor 

Forest >24 years old that is dominated by Raphia palms. 

10 

278 

65 

Secondary Forest 

Secondary Forest 

Sector 

Forest < 24 years old. 

61 

1695 

127 


Citrus 

Citrus 

Large orchards of Citrus spp. 

52 

909 

60 


Gmelina 

Gmel. 

Exotic tree plantations of Gmelina arbor ea. 

46 

1105 

80 

Tree Plantations 

Hieronyma 

Flier. 

Native tree plantations of Hieronyma alchorneoides. 

11 

310 

69 

Tectona 

Tect. 

Exotic tree plantations of Tectona grandis. 

38 

951 

75 


Terminalia 

Term. 

Native tree plantations of T. amazonia and the non-native T. ivorensis. 

11 

305 

61 


Vochysia 

Voch. 

Native tree plantations of V. ferruginea and V. guatemalensis. 

10 

253 

43 
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2.4.2. Classification Model Comparison 

We compared four separate models integrating multitemporal Landsat data into a single-date 
hyperspectral classification (Table 3). The first classification model (“Hyper”) consisted of hyperspectral 
data only. A second model (“HyperLs”) used both the hyperspectral data and two metrics derived from 
multitemporal Landsat spectral data (i.e., the mean and variance of band 5 over the image composite 
time series) as predictor variables in a supervised classification. The third model (“HyperLC”) modified 
the Hyper results post-classification using the time series of Landsat land-cover data to reclassify mature 
forests as secondary forest where they intersected Landsat time series-derived secondary forest. The 
fourth model (“HyperLsLC”) was exactly the same as the third model, except that it modified the results 
of the HyperLs classification model. 

Table 3. Summary, conceptual model of the four Random Forest models. Situations where 
only single-date training data exists could employ Models 1 or 2, while situations with 
multiple dates of training data could use Models 3 or 4. In our study, multiple dates of 
hyperspectral imagery do not currently exist for most of the region, limiting the utility of 
multiple dates of training data. 


Training Data 

Imagery Data 

Single Date 


Multiple Dates 

Hyperspectral 

(l).Hyperspectral (Hyper) 


NA 

Hyperspectral + Landsat 

(2).Hyper + Landsat spectral data (Ls) 

(3).Hyper + Landsat land cover data (LC) 
Hyper + Ls + LC 


Prior to implementing classification models, we reduced data dimensionality for the training data 
using linear discriminant analysis (LDA). LDA has been extensively used to discriminate tree species 
with remotely sensed data and performs well with hyperspectral data despite violations of normality 
assumptions [39,107], In this analysis, the 105 hyperspectral bands (Figure 4) were reduced to 19 LDA 
bands with low correlation. The LDA helped to reduce radiometric differences across image tiles, 
improving the consistency of training and testing samples across the study area. The LDA model was 
developed using all 20 possible land cover classes because initial analyses revealed that combining land 
use classes prior to classification markedly decreased forest type discrimination. 

The Random Forest (RF) machine learning algorithm was used to develop a classifier for the 
LDA-processed hyperspectral and Landsat image data [108], Random Forest is an ensemble decision 
tree classifier that is non-parametric and flexible with regards to data distributions and assumptions 
(see [108] for a full description). RF has been used for hyperspectral classification [45,109,110] 
and the analysis of datasets with hundreds to thousands of potentially correlated predictors (e.g., [1 1 1]). 
We implemented RF using the RandomForest package in the R statistics package v. 2.14.1 [112], 
with default settings and n = 1000 decision trees per classification. 
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Figure 4. Mean hyperspectral reflectances for the forest classes in this analysis, derived from 
the HyMap imagery at the training data locations. Class sample sizes are shown in the 
legend; see Table 2 for class name acronyms. The gray polygon represents one standard 
deviation above and below the yellow secondary forest class (abbreviated here as “Secfor”). 
Water- absorbing spectral regions (centered on 1.3 and 1.9 pm) were excluded from the 
analysis and are not shown. 

2.4.3. Post-Classification Processing 

Following hyperspectral image classification, post-processing was conducted to remove some 
misclassification errors and improve discrimination of individual land cover classes. The Random Forest 
classifier outputs from the Hyper and HyperLs models were used to predict land use maps, and each map 
was filtered to remove classification speckle anomalies in three steps. First, a duplicate image was 
processed with a 3 x 3 moving window majority filter. Second, adjacent pixel clusters less than 4 pixels 
in size were sieved in the original classified map. Finally, sieved pixels were replaced with pixels from 
the filtered duplicate image. This process removed speckle, preserved narrow linear features from 
replacement by majority classes, set the minimum mapping unit at 0.09 hectares (four pixels), and 
increased map accuracy by 1.7%-2% across classification models. Post-processed map accuracy was 
assessed using independent validation data, as described below. 

The post -processed classification results from Hyper and HyperLs were the starting points for the 
HyperLC and HyperLsLC models, respectively (Figure 2, Table 3). We used the Landsat-derived 2005 
reforestation mask to reclassify mature forest in the Hyper and HyperLs land use maps to secondary forest. 
We then assessed the accuracy of the reclassified land use maps for the HyperLC and HyperLsLC models. 
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2.5. Accuracy Assessment 

2.5.1. Independent Validation Data 

To assess classification accuracy with independent validation data, we evaluated 1086 testing points 
using a stratified random approach. For each land use class, 64 randomly-located data points were 
generated using the Raster package in R statistics software (v. 2. 14). The land use class of each validation 
point was identified using aerial and Landsat imagery. Points were discarded when they intersected a 
cloud, cloud shadow, or were located within a land use type that could not be identified. 

Validation samples from aerial and satellite imagery for tree plantation species and secondary forests 
could only be confirmed for a limited number of sites, contributing to a low number of initial samples 
for these forest types. In order to obtain a sufficient number of independent samples to evaluate each 
classification model, we combined the stratified random sample with field validation data (Section 2.3) 
randomly withheld from the classification model. We randomly withheld 30% of the field data for testing 
purposes, or 558 land use points. All retained testing points were separated by >100 m from each other 
and training data polygons to reduce spatial autocorrelation between samples. The median number of 
testing samples across all classes was 74 (Table 2). 

2.5.2. Subsampling the Validation Data 

To correct for an uneven number of testing samples [113], we equalized sampling effort across classes 
by creating multiple balanced sets of test data using a subsampling bootstrap of the original test data [114]. 
“Subsampling” bootstrap samples without replacement were developed relatively recently [114] and are 
mathematically straightforward and considered more robust to lack of independence than traditional 
bootstraps [115]. Subsampled bootstraps allowed the estimation of confidence intervals through an 
asymptotic sampling function based on the Central Limit Theorem (by default and in our study, a square 
root function), a selected subsample size ( n = 20, sampled without replacement) from each land use class, 
and the original population size of validation points (Table 2) [114,115]. Subsample bootstraps were run 
1000 times to develop confidence intervals around accuracy estimates and approximate a stratified 
random sampling design [116-118]. 

We compared RF model performance using four summary land use categories: tree plantations, 
secondary forests, mature forests, and all other classes (Table 2). Each of the four summary land use 
categories (e.g., mature forest) was subsampled as one class in the accuracy assessment, with proportional 
sampling from each sub-class (e.g., lowland mature forest and swamp forest). For each model, we 
evaluated RF model accuracy using subsampled test data, calculating the mean and 95% confidence 
interval of several statistics from error matrices: overall percent accuracy, Kappa statistic, and omission 
and commission accuracies for each class. For the class accuracy statistics, we used the size of the original, 
unbootstrapped sample (Table 2, “Testing points”) to calculate standard error in confidence intervals. 
For the overall accuracy and Kappa statistics, we used the mean of the test sample size across land use 
classes (82 pixels) to calculate standard error in confidence interval estimates. To statistically compare 
overall and class accuracies among classification models, we used generalized linear models (GLM). 
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3. Results and Discussion 


3.1. Forest Type Classification 

The four RF models showed increased overall bootstrapped accuracy as additional multitemporal data 
were added to each classification approach (Hyper model, 83.5%; HyperLsLC: 88.5%). The addition of 
land cover information on secondary forests (LC models) significantly improved ( p < 0.0001) accuracy 
by +1.7 to +2.7 (Figure 5); Kappa values also significantly increased ( p < 0.0001; Figure A2). The addition 
of multitemporal spectral data led to a slightly smaller increase in accuracy than the Landsat -derived 
reforestation data (model HyperLC vs. HyperLs, p < 0.0001), but combining both land cover and spectral 
data led to the highest overall model accuracy (HyperLsLC, p < 0.0001). For the Hyper and HyperLs 
models, the Random Forest “out-of-bag” (OOB) internal accuracy measure was much higher and less 
variable (-96%) than the overall accuracy derived from independent test data because spatial 
autocorrelation was reduced for independent test data (Figure 5). 
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Figure 5. Overall accuracy comparison of the four Random Forest models. For each model, 
the bootstrapped overall accuracy for the four summary land-use classes is shown in black 
(mean and 95% confidence intervals), and the original unbootstrapped accuracy is shown in 
red. Letters denote that bootstrapped accuracy is significantly different for all of the models 
{p < 0.0001). For the two models on the left with distinct RF classification models 
(see text), we show the overall mean accuracy of the full twenty-class RF model on the 
original unbootstrapped testing data and the corresponding OOB accuracy for the same 
twenty-class model. 

Tree plantations were well discriminated from secondary and mature forests across all models (Figure 6, 
Table S2). Although secondary and mature forests were confused with each other, the Hyper model 
classified tree plantations with 85.6% producer’s accuracy. Including multitemporal spectral data 
(models HyperLs and HyperLsLC) caused a significant (p < 0.0001), but small (+1.5), increase in the 
user’s accuracy of tree plantations. The inclusion of multitemporal land cover information (e.g., 
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the HyperLC model) did not increase the accuracy of tree plantation classification, because the information 
was only used to reclassify mature forests to secondary forests. 
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Figure 6. Producer’s and user’s accuracy for the Random Forest models, by summary 
land-use class (mean ± 95% Cl). Producer’s accuracy is defined as 100 minus the error of 
omission for a given class, and user’s accuracy is defined as 100 minus the error of com mis sion 
for a given class. 


Table 4. Confusion matrices of the bootstrap results for forest summary classes, comparing 
the Hyper RF model accuracy and the HyperLsLC RF model accuracy. 


Model: Hyper 


Reference Data 




Predicted 

Other 

Mature Forest 

Secondary Forest 

Tree Plantations 

Total 

User Acc. 

Other 

19,718 

0 

1272 

605 

21,595 

91.6 

Mature Forest 

67 

19,054 

4533 

1068 

24,722 

77.5 

Secondary Forest 

80 

533 

10,905 

1200 

12,718 

86.2 

Tree Plantations 

135 

413 

3290 

17,127 

20,965 

82.1 

Total 

20,000 

20,000 

20,000 

20,000 


Overall: 

Prod. Acc. 

98.6 

95.3 

54.5 

85.6 


83.5 

Model: HyperLsLC 

Reference Data 




Predicted 

Other 

Mature Forest 

Secondary Forest 

Tree Plantations 

Total 

User Acc. 

Other 

19,727 

0 

1150 

688 

21,426 

91.8 

Mature Forest 

41 

18,915 

647 

146 

22,489 

95.9 

Secondary Forest 

104 

837 

15,153 

2158 

14,558 

83.5 

Tree Plantations 

128 

248 

3050 

17,008 

21,527 

83.6 

Total 

20,000 

20,000 

20,000 

20,000 


Overall: 

Prod. Acc. 

98.6 

94.6 

75.8 

85.0 


88.5 
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The Hyper model had relatively low producer’s accuracy for secondary forests (54.5%) and user’s 
accuracy for mature forests (77.5%), primarily because of confusion between these forest types. 
The addition of multitemporal Landsat spectral data (models HyperLs and HyperLsLC) showed a 
significant increase (+5.8 to +7.5 between models, p < 0.0001; Figure 6) in the producer’s accuracy of 


Remote Sens. 2015, 7 


5676 


secondary forests. Models with multitemporal spectral data had markedly improved discrimination of 
mature forests from tree plantations and secondary forests, and only slightly lower discrimination 
between tree plantations and secondary forests (Table S2). As a result, small but significant increases in 
user’s accuracy of mature forests were observed in these models (p < 0.0001), along with small but 
significant declines in producer’s accuracy for mature forests and in user’s accuracy for secondary forests. 

Incorporating information on secondary forests from Landsat-derived land cover maps (models 
HyperLC and HyperLsLC) significantly improved discrimination of secondary forests overall (12.7 to 
15.0 between models, p < 0.0001; Figure 6). There was a concomitant small decline in accuracy for 
mature forests (~-2 across models in user’s and producer’s accuracy, respectively; Figure 6, Table S2). 
Because initial analysis indicated heavy confusion between mature and secondary forests in the Hyper 
model (Figure 6), we assumed that the spectral similarity and species overlap between these two forest 
types justified reassignment based on multitemporal information. Despite the highest overall accuracy 
for the HyperLsLC classification models, secondary forests only achieved acceptable producer’s 
accuracy (75.8%) because of persistent confusion with tree plantations and non-forest land uses (Table 4). 

3.2. Tree Plantation Species Discrimination 


Across all of the models, all six tree species within plantations were classified with acceptable (>67%) 
to high (>80%) producer’s accuracy and, with the exception of Vochysia spp., high user’s accuracy 
(Figure 7, Table 5). 
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Figure 7. Producer’s and user’s accuracy across the four Random Forest models for tree 
plantations as a general class (left column; mean ±95% Cl) and for each tree plantation species. 
Producer’s and user’s accuracy are defined in Figure 6. Please note that, for clarity, the order 
of the species and RF models has changed slightly from other figures. The three non-native 
species are grouped on the left, while the three native species are grouped on the right. 
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Table 5. Full confusion matrix for the HyperLsLS Random Forest model, with tree plantations 
as a class separated into the component tree species. The confusion matrix in bold is same 
one shown in Table 4. Row and column totals and accuracies in bold refer to the confusion 
matrix in bold. The individual tree species (italics) sum up to the all tree plantation species 
category in bold to the top left. We show mean user’s and producer’s accuracy for the filtered 
final map from 1000 subsampled runs with 20 samples each (20,000 samples total). 
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All Other 

19,727 
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1150 
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47 
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91.8 
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41 

18,915 

647 

146 

0 
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19,749 
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Second. 

104 

837 

15,153 

2158 

114 
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Forest 
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248 

3050 

17,008 







20,434 

83.6 

Citrus 

12 

0 

640 


3005 

0 

0 

46 

0 

82 

3785 

79.4 

Gmelina 

0 

0 

612 


0 

2893 

0 

0 

0 

0 

3505 

82.5 

Hieronyma 

12 

0 

0 


0 

0 

2585 

0 

0 

157 

2754 

93.9 

Tectona 

24 

0 

0 


0 

0 

0 

3009 

0 

0 

3033 

99.2 

Terminalia 

46 

0 

464 


0 
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0 

0 

2317 

70 

2897 

80.0 

Vochysia 

34 

248 

1334 


0 

0 

100 
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0 

2744 

4460 

61.5 

Total 

200,000 

20,000 

20,000 

20,000 

3351 

3284 

3285 

3362 
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85.0 

89.7 

88.1 

78.7 

89.5 

67.8 

83.1 




The main source of omission error for tree plantation species was misclassification as secondary 
forests (Table 5). The three non-native tree species had higher classification accuracy than the native 
tree species, and Terminalia spp. had the lowest producer’s accuracy and greatest confusion with 
secondary forests, followed by commission errors for Vochysia spp. (Table 5). 

As noted above, the reclassification of secondary forest using land cover maps (the-LC models) had 
little effect on tree plantation classification accuracy. Differences between the Hyper and HyperLC models, 
for example, were likely a result of minor differences between RF decision tree classifiers that were 
developed separately. Including multitemporal spectral data, however, positively and negatively affected 
classification accuracy for plantation tree species composition. Tree species with more open canopies, 
including Citrus, Tectona, and Terminalia, had small declines (l%-7%) in producer’s and/oruser’s accuracy 
in the Ls models (Figure 7). Vochysia and Gmelina, by contrast, had large increases (4%-8%) in both 
producer’s and user’s accuracy, while Hieronyma was largely unchanged between models (Figure 7). 

Landscape analysis of the most accurate overall map (HymapLsLC, Figure 8; Table A3 has the full 
map accuracy statistics) revealed that 10.5% of all forests in the study region were tree plantations in 
2005, but only 1.3% of all forests were native tree plantations (Figure 9). 
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Figure 8. Full twenty-class land cover map. The HyperLsLC land-use classification is shown 
inside the white outline. The regional forest-nonforest map outside the white outline is 
derived from 2005 Landsat imagery (see Fagan et al. [9]); nonforest is shown in light gray 
and forest in light green. 

3.3. Hyperspectral Classification Accuracy 

3.3.1. Classification of Tree Plantations 

We found that single-date hyperspectral data was able to accurately distinguish tree plantations from 
mature forests and secondary forest, although secondary forests had poor overall accuracy. Furthermore, 
hyperspectral data alone (Hyper model) accurately classified all six tree plantations to species with 
acceptable to high (75%-92%) producer’s accuracy and intermediate to very high (57%-99%) user’s 
accuracy (Figure 7). There was marked variation among tree plantation species in classification accuracy 
(Table 5). A common non-native plantation species, Tectona spp., had the highest user’s and producer’s 
accuracy across models, most likely because of its distinctive foliage and spectral values in the mid 
infra-red (1490-1760 nm and 2000-2400 nm; Figure A3). Terminalia spp. tree plantations had the 
lowest producer’s accuracy across models and were often misclassified as secondary forest. This tree 
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genus has a thin, high canopy with low leaf area index (LAI) [89] and its spectral signature is likely to 
have been influenced by abundant native understory trees. 

Because monospecific tree plantations were of limited spatial extent in our landscape relative to 
secondary and mature forests (Figure 9), we chose to first assess accuracy for tree plantations as a single 
map category, and then assess accuracy for individual reforestation species classes. Using a stratified 
random and bootstrapped sample of our validation data, producer’s and user’s accuracies for the tree 
plantation species were acceptable to high across models (Figure 7). In support of our first hypothesis, 
non-native tree species had higher producer’s accuracy than native species across models (Figure 7). 
Native genera, like Vochysia spp., were present in multiple forest types (mature and secondary forest, 
and tree plantations), which may have lowered accuracy. Vochysia ferruginea, in particular, is quite 
common in secondary forests in the region (R.C. Chazdon, pers. comm.). It is also likely that the higher 
number of training pixels for the non-native species may be partly responsible for differences in spectral 
separability between native and non-native tree species. 



Figure 9. Area of different land-uses in the final 2005 map. Natural forest types are in bold, 
and tree plantation species are in italics to the right. 

Tree species classification accuracy from this study was comparable to other studies that have 
attempted to separate tropical tree species using aerial hyperspectral data. For example, Clark et al. [43] 
achieved 92% overall accuracy in separating seven Costa Rican tree species with high-resolution 
HYDICE imagery, but individual species’ producer’s accuracies varied from 70%-100%. More recently, 
Feret and Asner [39] had -85% overall classification accuracy with six Hawaiian tree species and 73.2% 
overall accuracy with seventeen tree species using high-spectral resolution CAO imagery. In our study 
region, reflectance values for tree species within plantations may have overlapped with natural forest 
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because of dense understory regrowth, local canopy dieback from disease or seasonal leaf loss, the 
occurrence of similar species in both plantations and secondary forests, and spectral averaging in 
moderate-resolution imagery [48, 1 19]. However, there was little confusion between tree plantation species 
in our study. As a result, assessing the accuracy of tree plantations as one singular forest type rather than 
as individual tree plantation species only slightly improved the producer and user’s accuracy (85% and 
84%, respectively; Table 5) over the mean accuracy across tree species (83% and 83%, respectively). 

3.3.2. Classification of Other Forest Types 

Despite high accuracy with tree plantations, we were not able to accurately classify secondary forests 
using hyperspectral data alone (<55% producer’s accuracy), in large part due to their confusion with 
mature forests and, to a lesser extent, tree plantations (Table 3). The poor secondary forest accuracy we 
observed is similar to the results of Galvao et al. [55] with CHRIS-PROBA imagery in Brazilian 
landscapes, rather than the higher accuracy reported by Thenkabail et al. [54] using Hyperion imagery 
in African forests. Because spectral NIR reflectance saturates as leaf area index and canopy closure 
increases during succession, we might expect mature forests to be spectrally similar to older reforestation 
in the absence of consistent species differences in reflectance [56]. If this is true, moderate-resolution 
hyperspectral images may be similar to moderate-resolution multispectral images in their inability to 
distinguish mature from older secondary forests, particularly in wet tropical forest that can quickly 
achieve canopy closure in less than a decade [55, 77], 

Future efforts to map secondary forests using single-date hyperspectral imagery could be more 
successful using spectral signatures of pioneer species (e.g., Cecropia spp.) and canopy N content as 
indicators of secondary forest [54, 122-124], In our image mosaic, the continuous spectral transitions 
between forest successional stages observed in other tropical regions [125] may have been masked by 
spatial variation in reflectance due to selective logging, image differences, and atmospheric haze. Forest 
undisturbed by selective logging was relatively rare in our landscape. Forests with different logging 
intensities have been shown to exhibit distinct hyperspectral spectral reflectance [97], 

3.4. Hyperspectral and Multitemporal Data Classification Accuracy 

The inclusion of multitemporal Landsat information in the hyperspectral analysis significantly improved 
the classification accuracy of tree plantations, secondary forests, and mature forests (Figure 6). We were 
able to distinguish between all forest types with good producer’s and user’s accuracy by combining 
hyperspectral data, Landsat multitemporal spectral data, and Landsat-derived information on forest 
history (the HyperLsLC model; Table 5). We interpret these results as support for our hypothesis that 
including multitemporal information would improve map accuracy for tree plantations. However, 
increased accuracy was largely a result of better discrimination of mature and secondary forests. 
Relatively little gain in accuracy was achieved for tree plantations as a single land use category. 

We were able to classify the six tree species within plantations with reasonably high accuracy using 
hyperspectral imagery alone. Including multitemporal spectral information slightly lowered the 
classification accuracy of species with open and/or temporally variable canopies (the tree crop Citrus, 
the deciduous Tectona, and the thin-canopied Terminalia). However, multitemporal information markedly 
increased the classification accuracy of the tree species with more dense canopies ( Gmelina and 
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Vochysia ). Temporal variability in reflectance within plantations dominated by open species may not be 
uniform across space, likely because of forest management history and site factors that can influence 
canopy and understory structure (e.g., [52,57,58]). Overall, there was a small average gain in user’s 
accuracy across species. There was mixed support for our second hypothesis, suggesting that adding 
multitemporal information to hyperspectral classification may be most useful in regions more 
extensively dominated by secondary forests and dense-canopied tree plantations. The additional 
processing needed to include multitemporal satellite imagery may be most efficiently applied in cases 
where distinguishing secondary and mature forest types from one another is needed to assess 
landscape-scale forest conditions. 

The accuracy improvement from including multitemporal spectral data was smaller than anticipated. 
In the Landsat imagery, mature forests frequently showed minor spectral changes over time (Figure 3) 
or were contaminated by cloud artifacts despite cloud masking, leading to false positives for spectral 
changes. Spectral changes from forest clearing and secondary forest regrowth were large in many 
locations (Figure 3), but in some locations changes were smaller, with incomplete disturbance and rapid 
forest recovery. The mismatch in spatial resolution between Landsat and HyMap may also have 
contributed to error in the spectral change metrics, as secondary forest and tree plantations typically 
occurred in small patches adjacent to mature forests. 

Greater gains in accuracy from integrating multitemporal information with hyperspectral data would 
likely be realized by tracking reforestation across a greater number of Landsat image dates with 
enhanced cloud filtering and replacement techniques, utilizing extensive and freely available image 
archives [78,82]. We used only four Landsat image dates for this analysis because of extensive 
cloudiness in this region. Better indices of spectral change could be derived from a subdecadal cloud-free 
composite methodology [79], or a monthly Landsat time series with a cloud pixel filter based on temporal 
outliers [126]. The inclusion of more frequent imagery can potentially reveal spectral or land use 
trajectories unique to different species within plantations or to specific management techniques applied, 
like tree thinning (e.g., [75]). The use of a variety of spectral indices might also improve the sensitivity 
of the data to spectral shifts with regrowth and disturbance [79]. The utility of a multitemporal approach 
will likely increase as new multitemporal change indices are tested [79,126] and the Landsat archives 
are more specifically processed to facilitate multitemporal comparisons [12]. There is an increasing number 
of pre-processed image products for making consistent land change measurements over time [8 1 , 1 27, 1 28] . 

3.5. Model Accuracy Assessment 

In our study, random sampling of tree plantations for independent accuracy assessment was difficult 
due to the low reliability of distinguishing tree plantations from secondary forests on aerial photos. 
As a result, we included opportunistically- sampled ground validation data to increase sample size for 
secondary forests and tree plantation species. Because the inclusion of ground validation data 
imbalanced class sizes across our class strata [113,118], we employed a subsampling bootstrap to 
equalize class sizes in a stratified random sampling validation. The subsampling bootstrap permitted a 
more robust estimate of the accuracy confidence intervals for each class, which has two principal 
limitations. First, equal class sample sizes overestimated errors of commission for common classes like 
secondary forest by assuming that all classes were equally abundant in the study area. A stratified random 
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sampling scheme with equal class sizes emphasizes user’s accuracy (errors of omission) over overall 
accuracy [129]. Secondly, the subsampling bootstrap depends on sufficient sample sizes and well-spaced 
sample locations to estimate class accuracy, and may over- or underestimate error for classes with low 
sample sizes, such as areas reforested with native species in our study region. 

Band reduction and transformation using LDA decreased but did not eliminate inter-image differences 
(Figures 1 and 3), even though training data for all classes were collected across multiple image tiles. 
Radiometric correction of hyperspectral imagery, especially imagery mosaics with limited overlap 
among image tiles, is an area where additional research effort is needed on how to more effectively 
normalize reflectance data across flight lines [100,130,131]. Accurate surface reflectance estimates 
across hyperspectral image flight lines will be required for other approaches utilizing field spectroscopy 
rather than extensive field data collection efforts [132], Improved radiometric, terrain, and atmospheric 
correction between our HyMap image tiles could have potentially increased classification accuracy. 
However, because our training and validation data for each class came from multiple image tiles and 
encompassed a variety of site, terrain shadowing, atmospheric, and image-illumination differences 
(Figures 4 and A3), our final classified land use maps were free of image artifacts (Figure 8). 

3.6. Status of Tree Plantations in Northeastern Costa Rica 

Primary forests are being converted to tree plantations and tree crops across the tropics [15,70,133]. 
However, in previously deforested landscapes like northeastern Costa Rica, tree plantations can dramatically 
enhance carbon storage and tree cover [ 1 34, 1 35] . Because of the rarity of native tree plantations in our region, 
maps that do not distinguish non- native tree plantations from secondary forests (e.g., [12]) are much more 
likely to have biased estimates of natural regeneration and deforestation rates than maps that attempt to 
distinguish plantations from secondary forest (e.g., [9,37]). We estimate that only -3% of the reforestation 
classes mapped by Fagan et al. [9] in this region were actually native tree plantation species, with the 
remainder secondary forests (-78%) and non-native tree plantation species (-19%). 

The dominance of non-native tree plantations in this region, coupled with the shorter rotation times 
of non-native species (Table 1), indicates that tree plantation cover will be highly dynamic in the coming 
decade. While plantation management will benefit timber supplies in this region, rapid harvest rates may 
limit the potential for long-term increases in forest connectivity from tree planting in this habitat corridor. 
Further, non-native tree plantations were often planted outside the SJLS corridor (Figure 8). The 
re-establishment of long-lasting and diverse forest cover in this landscape would likely benefit from 
adjusting reforestation payments to favor reforestation with higher-value native tree species. Native tree 
plantation species were largely planted within the SJLS Corridor and in the forested foothills to the 
southeast, adjacent to existing forest cover (Figure 8). Native species can have longer rotation times, but 
often require less intensive management (Table 1), and wood products and other ecosystem services 
provided by native trees may eventually be preferred over non-native species, as new data on growth 
rates and rates of return on investments become available [136], 

4. Conclusions 

Given the increasing importance of tropical tree plantations and tree crops, there is an urgent need to 
improve our ability to inventory and monitor tree plantations and distinguish them from natural forest types. 
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In this initial test of a new methodology, we found that hyperspectral and multitemporal data can be 
effectively integrated to discriminate tree plantations from secondary forests and mature forests with 88.5% 
overall accuracy. The addition of multitemporal spectral metrics and land use history derived from a Landsat 
image time series improved the classification accuracy of secondary forests and closed canopy tree plantation 
species, but slightly decreased discrimination of plantation species with more open canopies. Adding 
multitemporal information may only be worth the added image processing effort when image quality is high 
and/or project objectives include mapping of secondary forest or dense tree plantations. 

As increasing amounts of high-quality hyperspectral imagery become available from aerial 
(CAO, NEON, GLihT, AVIRIS, HyMap, CASI) and satellite (HyspIRI, EnMap) sensors over the next 
decade [42,77,130], our results suggest that it may be possible to use hyperspectral data alone to make 
accurate regional or even global maps of tree crop and plantation species. However, to inventory multiple 
tropical forest types in support of reforestation and forest management initiatives like REDD+, we show 
that it may be necessary to combine hyperspectral imagery with other remotely sensed data that has 
greater temporal frequency and a longer historical record. Both the acquisition of rigorous validation 
data on tree plantations and the integration of hyperspectral and multitemporal imagery can be logistically 
demanding, but we found that together they can reveal insights into reforestation patterns important to 
refining national reforestation programs. 
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Figure Al. The 2005 Landsat-based land cover map used as part of the multitemporal forest 
regrowth classification in this study. The red outline denotes the San Juan-La Selva Biological 
Corridor, and the white outline denotes the outer boundary of the HyMap land-use map (see 
Figure 8). Non- native tree plantations are shown in purple, and native reforestation (including 
secondary forest and native tree plantations) is shown in green; all other land cover classes are 
defined in Table 2. 

0.88 

0.86 

0.84 

m 0.82 
(0 
Q. 

CL 
03 

* 0.80 
0.78 
0.76 
0.74 



Hyper HyperLs HyperLC HyperLsLC 
Model Name 


Figure A2. Kappa values of the four different RF models (mean ±95% Cl) with four 
summary classes. All Kappa values differ significantly between models ( p < 0.0001). 
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Figure A3. Pairs of hyperspectral reflectance spectra, taken from the training data for all 
nine forest types. See Table 2 for sample sizes and class name acronyms. This figure is 
similar to Figure 4, but displays variability in each class with pairwise comparisons to 
secondary forest. Lines show mean reflectance, with buffer polygons showing ± one standard 
deviation. Secondary forests (pink lines, yellow polygons) are compared against the other 
forest classes (black lines, gray polygons). Note the large amount of variation in spectral 
reflectance for each forest type. 


Table Al. Image information for the Landsat image stack (1986-2011) and the Hymap 
image mosaic (2005). The reference image for radiometric correction is indicated for each 
of the two imagery types: Landsat and HyMap. 


Image Type 

Mosaic Year 

Dates Used 

Original Res. (m) 

Reference Image 

Landsat 5 

1986/87 

2/6/1986 

30 




3/13/1987 

30 


Landsat 5 

1996/97 

11/16/1996 

30 




12/21/1997 

30 


Landsat 5 

2001 

1/4/2001 

30 
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Table Al. Cont. 


Image Type 

Mosaic Year 

Dates Used 

Original Res. (m) 

Reference Image 

Landsat 7 

2005 

2/2/2005 

30 

** 



9/30/2005 

30 


HyMap 

2005 

3/1/2005 

15.4 




3/8/2005 

14.2 




3/10/2005 

16.7 

** 



3/17/2005 

15.8 




3/17/2005 

16 




3/25/2005 

15 



Table A2. Bootstrapped confusion matrices for forest summary classes, comparing all RF 
models to the base Hyper model. 


Model: Hyper 


Reference Data 




Predicted 

Other 

Mature Forest 

Secondary Forest 

Tree Plantations 

Total 

User Acc. 

Other 

19,718 

0 

1272 

605 

21,595 

91.6 

Mature Forest 

67 

19,054 

4533 

1068 

24,722 

77.5 

Secondary Forest 

80 

533 

10,905 

1200 

12,718 

86.2 

Tree Plantations 

135 

413 

3290 

17,127 

20,965 

82.1 

Total 

20,000 

20,000 

20,000 

20,000 


Overall: 

Prod. Acc. 

98.6 

95.3 

54.5 

85.6 


83.5 

Model: HyperLs 


Reference Data 




Predicted 

Other 

Mature Forest 

Secondary Forest 

Tree Plantations 

Total 

User Acc. 

Other 

19,719 

0 

1046 

663 

21,428 

92.3 

Mature Forest 

39 

19,231 

3161 

517 

22,948 

84.2 

Secondary Forest 

123 

521 

12,621 

1740 

15,005 

84.6 

Tree Plantations 

119 

248 

3172 

17,080 

20,619 

83.2 

Total 

20,000 

20,000 

20,000 

20,000 


Overall: 

Prod. Acc. 

98.6 

96.2 

63.1 

85.4 


85.8 

Model: HyperLC 


Reference Data 




Predicted 

Other 

Mature Forest 

Secondary Forest 

Tree Plantations 

Total 

User Acc. 

Other 

19,724 

0 

1247 

571 

21,542 

91.9 

Mature Forest 

52 

18,745 

1679 

395 

20,871 

90.1 

Secondary Forest 

107 

828 

13,904 

1926 

16,765 

83.5 

Tree Plantations 

117 

427 

3170 

17,108 

20,822 

82.6 

Total 

20,000 

20,000 

20,000 

20000 


Overall: 

Prod. Acc. 

98.6 

93.7 

69.5 

85.5 


86.9 

Model: HyperLsLC 


Reference Data 




Predicted 

Other 

Mature Forest 

Secondary Forest 

Tree Plantations 

Total 

User Acc. 

Other 

19,727 

0 

1150 

688 

21,565 

91.8 

Mature Forest 

41 

18,915 

647 

146 

19,749 

95.9 

Secondary Forest 

104 

837 

15,153 

2158 

18,252 

83.5 

Tree Plantations 

128 

248 

3050 

17,008 

20,434 

83.6 

Total 

20,000 

20,000 

20,000 

20,000 


Overall: 

Prod. Acc. 

98.6 

94.6 

75.8 

85.0 


88.5 
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Table A3. Bootstrapped accuracy confusion matrix for the HyperLsLC map in Figure 8. We 
show here the four summary classes disambiguated into their component classes, with 
producer’s and user’s accuracies estimated directly from the table. 



Banan 

a 

Hpalm 

Pina 

Cass. 

Soil 

Sand 

Cloud 

Past. 

Shade 

Urban 

Water 

Banana 

1685 

0 

0 

0 

19 

0 

0 

0 

0 

0 

0 

Hpalm 

0 

1780 

0 

30 

0 

0 

0 

11 

0 

0 

0 

Pina 

0 

0 

1795 

29 

17 

0 

25 

0 

0 

0 

0 

Cass. 

0 

0 

0 

1578 

13 

0 

26 

15 

0 

0 

0 

Soil 

0 

0 

0 

25 

1118 

23 

66 

9 

0 

0 

0 

Sand 

0 

0 

0 

0 

290 

1594 

0 

0 

0 

68 

20 

Cloud 

0 

0 

20 

27 

76 

25 

1518 

0 

0 

104 

39 

Past. 

15 

0 

17 

101 

96 

31 

47 

1600 

0 

99 

38 

Shade 

0 

0 

38 

37 

11 

0 

55 

41 

1957 

40 

134 

Urban 

0 

0 

0 

0 

135 

154 

0 

0 

0 

1481 

26 

Water 

0 

0 

0 

0 

9 

0 

0 

0 

0 

0 

1520 

Matfor 

0 

0 

0 

0 

0 

0 

33 

8 

0 

0 

0 

Swfor 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Secfor 

17 

17 

0 

0 

0 

0 

0 

52 

0 

0 

18 

Citrus 

0 

0 

0 

0 

0 

0 

0 

12 

0 

0 

0 

Gmel. 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Hier. 

0 

0 

0 

0 

0 

0 

0 

12 

0 

0 

0 

Tect. 

0 

0 

0 

0 

0 

0 

0 

24 

0 

0 

0 

Term. 

0 

0 

0 

0 

0 

0 

32 

14 

0 

0 

0 

Voch. 

14 

20 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Total 

1731 

1817 

1870 

1827 

1784 

1827 

1802 

1798 

1957 

1792 

1795 

Prod. Acc. 

97.3 

98.0 

96.0 

86.4 

62.7 

87.2 

84.2 

89.0 

100.0 

82.6 

84.7 


Matfor 

Swfor 

Secfor 

Citrus 

Gmel. 

Hier. 

Tect. 

Term. 

Voch. 

Total 

User Acc. 

Banana 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1704 

98.9 

Hpalm 

0 

0 

295 

0 

88 

0 

114 

0 

0 

2318 

76.8 

Pina 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1866 

96.2 

Cass. 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1632 

96.7 

Soil 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1241 

90.1 

Sand 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1972 

80.8 

Cloud 

0 

0 

0 

65 

32 

0 

0 

0 

0 

1906 

79.6 

Past. 

0 

0 

701 

167 

71 

47 

50 

54 

0 

3134 

51.1 

Shade 

0 

0 

154 

0 

0 

0 

0 

0 

0 

2467 

79.3 

Urban 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1796 

82.5 

Water 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1529 

99.4 

Matfor 

9062 

608 

647 

0 

0 

38 

0 

108 

0 

10504 

86.3 

Swfor 

0 

9245 

0 

0 

0 

0 

0 

0 

0 

9245 

100.0 

Secfor 

698 

139 

15153 

114 

200 

515 

143 

938 

248 

18252 

83.0 

Citrus 

0 

0 

640 

3005 

0 

0 

46 

0 

82 

3785 

79.4 

Gmel. 

0 

0 

612 

0 

2893 

0 

0 

0 

0 

3505 

82.5 

Hier. 

0 

0 

0 

0 

0 

2585 

0 

0 

157 

2754 

93.9 
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Table A3. Cont. 



Matfor 

Swfor 

Secfor 

Citrus 

Gmel. 

Hier. 

Tect 

Term. 

Voch. 

Total 

User Acc. 

Tect. 

0 

0 

0 

0 

0 

0 

3009 

0 

0 

3033 

99.2 

Term. 

0 

0 

464 

0 

0 

0 

0 

2317 

70 

2897 

80.0 

Voch. 

248 

0 

1334 

0 

0 

100 

0 

0 

2744 

4460 

61.5 

Total 

10008 

9992 

20000 

3351 

3284 

3285 

3362 

3417 

3301 



Prod. Acc. 

90.5 

92.5 

75.8 

89.7 

88.1 

78.7 

89.5 

67.8 

83.1 
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